Analysis of the results by county

This is the analysis by county. The main notebook of the whole analysis is located at Analysis.ipynb.

Import libraries and modules

We will make use of the following libraries in this notebook:

In [1]:
import pandas as pd
import json
import plotly.express as px
import scipy.stats as sp
import numpy as np
from IPython.display import display, Markdown

We also import our own constants and functions.

In [2]:
from own_data import candidates, candidates_colors, poland_center, poland_zoom, map_margin, opacity
from utils import comma_to_dot, get_last_name

Parse the results data

We read the csv files with the results by county given as a percentage. The data is taken from the website of the National Electoral Commission. Poland uses comma as a decimal separator. We convert the data to dot-separated numbers so that it works better with the libraries.

In [3]:
results_counties_percent_df = pd.read_csv('data/results/results_by_county_percent.csv', sep=';')
results_counties_percent_df = results_counties_percent_df[['Kod TERYT', 'Powiat'] + candidates]

for candidate in candidates:
    results_counties_percent_df[candidate] = results_counties_percent_df[candidate].map(comma_to_dot)
In [4]:
results_counties_percent_df.head()
Out[4]:
Kod TERYT Powiat Robert BIEDROƃ Krzysztof BOSAK Andrzej Sebastian DUDA Szymon Franciszek HOƁOWNIA Marek JAKUBIAK WƂadysƂaw Marcin KOSINIAK-KAMYSZ MirosƂaw Mariusz PIOTROWSKI PaweƂ Jan TANAJNO RafaƂ Kazimierz TRZASKOWSKI Waldemar WƂodzimierz WITKOWSKI StanisƂaw JĂłzef ƻÓƁTEK
0 20100 bolesƂawiecki 2.08 6.72 41.56 16.39 0.13 1.54 0.07 0.12 31.03 0.13 0.24
1 20200 dzierĆŒoniowski 2.16 6.09 44.03 13.67 0.12 1.48 0.10 0.18 31.87 0.10 0.19
2 20300 gƂogowski 2.33 5.95 44.19 14.21 0.10 1.75 0.10 0.14 30.94 0.10 0.20
3 20400 gĂłrowski 2.00 6.20 50.04 13.14 0.14 2.59 0.08 0.10 25.49 0.07 0.15
4 20500 jaworski 2.05 6.55 47.78 11.18 0.12 3.08 0.12 0.09 28.77 0.06 0.19

Parse the geographical data

Additionally, we import the geographical data about borders of each county. The data is derived from the Head Office of Geodesy and Cartography. The webiste of GIS Support PL let us solely download the package with counties. To create maps I will use GeoJSON format. The data from the websites mentioned before has the .shp extension, so I have formatted it to GeoJSON using MapShaper.

In [5]:
with open('data/geojson/counties.json', encoding='utf-8') as response:
    counties = json.load(response)
In [6]:
counties['features'][0]['properties']
Out[6]:
{'JPT_SJR_KO': 'POW',
 'JPT_KOD_JE': '1815',
 'JPT_NAZWA_': 'powiat ropczycko-sędziszowski',
 'JPT_ORGAN_': '',
 'JPT_JOR_ID': 13415,
 'WERSJA_OD': '2012-09-26T00:00:00.000Z',
 'WERSJA_DO': '1899-11-30T00:00:00.000Z',
 'WAZNY_OD': '2012-09-26T00:00:00.000Z',
 'WAZNY_DO': '1899-11-30T00:00:00.000Z',
 'JPT_KOD__1': '',
 'JPT_NAZWA1': '',
 'JPT_ORGAN1': 'NZN',
 'JPT_WAZNA_': 'NZN',
 'ID_BUFORA_': 13878,
 'ID_BUFORA1': 0,
 'ID_TECHNIC': 829084,
 'IIP_PRZEST': 'PL.PZGIK.200',
 'IIP_IDENTY': 'e86b1e71-8958-42ee-bec5-ca3c87907bc8',
 'IIP_WERSJA': '2012-09-27T08:01:01+02:00',
 'JPT_KJ_IIP': 'EGIB',
 'JPT_KJ_I_1': '1815',
 'JPT_KJ_I_2': '',
 'JPT_OPIS': '',
 'JPT_SPS_KO': 'UZG',
 'ID_BUFOR_1': 0,
 'JPT_ID': 829084,
 'JPT_KJ_I_3': '',
 'Shape_Leng': 1.77616265095,
 'Shape_Area': 0.0688251674637}

Integrate the two data sets

The TERYT code is a unique code of each administrative unit. In the election results the code has two extra 0s. Additionally, it doesn't have a leading zero when a voivodeship number consists only of one digit. We are going to fix these issues to connect these two data sets.

In [7]:
def fix_teryt_county(teryt):
    """Fix TERYT code to integrate the two datasets for counties."""
    teryt = str(teryt)
    
    if len(teryt) == 5:
        teryt = '0' + teryt
    
    return teryt[:-2]
In [8]:
results_counties_percent_df['Kod TERYT'] = results_counties_percent_df['Kod TERYT'].astype(str).map(fix_teryt_county)
In [9]:
results_counties_percent_df.head()
Out[9]:
Kod TERYT Powiat Robert BIEDROƃ Krzysztof BOSAK Andrzej Sebastian DUDA Szymon Franciszek HOƁOWNIA Marek JAKUBIAK WƂadysƂaw Marcin KOSINIAK-KAMYSZ MirosƂaw Mariusz PIOTROWSKI PaweƂ Jan TANAJNO RafaƂ Kazimierz TRZASKOWSKI Waldemar WƂodzimierz WITKOWSKI StanisƂaw JĂłzef ƻÓƁTEK
0 0201 bolesƂawiecki 2.08 6.72 41.56 16.39 0.13 1.54 0.07 0.12 31.03 0.13 0.24
1 0202 dzierĆŒoniowski 2.16 6.09 44.03 13.67 0.12 1.48 0.10 0.18 31.87 0.10 0.19
2 0203 gƂogowski 2.33 5.95 44.19 14.21 0.10 1.75 0.10 0.14 30.94 0.10 0.20
3 0204 gĂłrowski 2.00 6.20 50.04 13.14 0.14 2.59 0.08 0.10 25.49 0.07 0.15
4 0205 jaworski 2.05 6.55 47.78 11.18 0.12 3.08 0.12 0.09 28.77 0.06 0.19

This is the location of the key that will join our data sets in counties JSON:

In [10]:
counties['features'][0]['properties']['JPT_KOD_JE']
Out[10]:
'1815'

Plot maps

We finally plot the data on maps.

In [11]:
def get_figure_results_by_county(candidate):
    """Get figure showing a map of results of the given cadidate by county."""
    candidate_df = results_counties_percent_df[['Kod TERYT', 'Powiat', candidate]]
    
    # We remove the results from ships and abroad because they will not be shown on the map.
    candidate_df = candidate_df[candidate_df.Powiat != 'statki']
    candidate_df = candidate_df[candidate_df.Powiat != 'zagranica']
    
    fig = px.choropleth_mapbox(
        candidate_df, geojson=counties, color=candidate,
        locations='Kod TERYT', featureidkey="properties.JPT_KOD_JE",
        center=poland_center,
        opacity=opacity, color_continuous_scale=candidates_colors[candidate],
        hover_data={'Powiat': True, 'Kod TERYT': False},
        mapbox_style="carto-positron", zoom=poland_zoom
    )
    
    fig.update_layout(margin=map_margin)
    
    return fig
In [12]:
for candidate in candidates:
    display(Markdown(f'### Results of {candidate} by county'))
    get_figure_results_by_county(candidate).show()

Results of Robert BIEDROŃ by county

Results of Krzysztof BOSAK by county

Results of Andrzej Sebastian DUDA by county

Results of Szymon Franciszek HOŁOWNIA by county

Results of Marek JAKUBIAK by county

Results of Władysław Marcin KOSINIAK-KAMYSZ by county

Results of Mirosław Mariusz PIOTROWSKI by county

Results of Paweł Jan TANAJNO by county

Results of Rafał Kazimierz TRZASKOWSKI by county

Results of Waldemar Włodzimierz WITKOWSKI by county

Results of Stanisław Józef ŻÓŁTEK by county

Who won in each county?

Find the winner in each county

In [13]:
winners_counties_df = pd.concat([
    results_counties_percent_df[candidates].idxmax(axis=1).rename('Winner').to_frame(),
    results_counties_percent_df[candidates].max(axis=1).rename('Result').to_frame(),
    results_counties_percent_df[['Powiat', 'Kod TERYT']]
], axis=1)
In [14]:
winners_counties_df.head(1)
Out[14]:
Winner Result Powiat Kod TERYT
0 Andrzej Sebastian DUDA 41.56 bolesƂawiecki 0201

Plot the map

In [15]:
winners_counties_fig = px.choropleth_mapbox(
    winners_counties_df, geojson=counties, color='Winner',
    locations='Kod TERYT', featureidkey="properties.JPT_KOD_JE",
    center=poland_center,
    opacity=opacity, color_discrete_sequence=px.colors.qualitative.D3,
    hover_data={'Powiat': True, 'Kod TERYT': False, 'Result': True},
    mapbox_style="carto-positron", zoom=poland_zoom
)

winners_counties_fig.update_layout(margin=map_margin)

winners_counties_fig.show()

And what about the second place?

Find second results

In [16]:
values = results_counties_percent_df[candidates].values
first_third_highest = values[
    np.arange(len(results_counties_percent_df))[:,None],np.argpartition(-values,np.arange(4),axis=1)[:,:4]
]
second_values = first_third_highest[:,1]
In [17]:
second_values[:5]
Out[17]:
array([31.03, 31.87, 30.94, 25.49, 28.77])
In [18]:
second_values_df = pd.DataFrame(second_values, columns=['Result'])
In [19]:
second_values_df.head()
Out[19]:
Result
0 31.03
1 31.87
2 30.94
3 25.49
4 28.77

Match them with the candidates

In [20]:
def get_col_name(row):
    """Get the column name of the column which has the value in the corresponding data frame."""
    b = (results_counties_percent_df.loc[row.name] == row['Result'])
    return b.index[b.argmax()]
In [21]:
second_places_df = pd.concat([
    second_values_df.apply(get_col_name, axis=1).rename('Second place').to_frame(),
    second_values_df,
    results_counties_percent_df[['Powiat', 'Kod TERYT']]
], axis=1)
In [22]:
second_places_df.head()
Out[22]:
Second place Result Powiat Kod TERYT
0 RafaƂ Kazimierz TRZASKOWSKI 31.03 bolesƂawiecki 0201
1 RafaƂ Kazimierz TRZASKOWSKI 31.87 dzierĆŒoniowski 0202
2 RafaƂ Kazimierz TRZASKOWSKI 30.94 gƂogowski 0203
3 RafaƂ Kazimierz TRZASKOWSKI 25.49 górowski 0204
4 RafaƂ Kazimierz TRZASKOWSKI 28.77 jaworski 0205

Plot the map

In [23]:
second_places_fig = px.choropleth_mapbox(
    second_places_df, geojson=counties, color='Second place',
    locations='Kod TERYT', featureidkey="properties.JPT_KOD_JE",
    center=poland_center, opacity=opacity,
    color_discrete_sequence=['#FF7F0E', '#1F77B4', 'rgb(102,102,102)', 'rgb(255,217,47)'],
    hover_data={'Powiat': True, 'Kod TERYT': False, 'Result': True},
    mapbox_style="carto-positron", zoom=poland_zoom
)

second_places_fig.update_layout(margin=map_margin)

second_places_fig.show()

Last but not least – the third place

In [24]:
third_values = first_third_highest[:,2]
third_values_df = pd.DataFrame(third_values, columns=['Result'])

third_places_df = pd.concat([
    third_values_df.apply(get_col_name, axis=1).rename('Third place').to_frame(),
    third_values_df,
    results_counties_percent_df[['Powiat', 'Kod TERYT']]
], axis=1)
In [25]:
third_places_df.head()
Out[25]:
Third place Result Powiat Kod TERYT
0 Szymon Franciszek HOƁOWNIA 16.39 bolesƂawiecki 0201
1 Szymon Franciszek HOƁOWNIA 13.67 dzierĆŒoniowski 0202
2 Szymon Franciszek HOƁOWNIA 14.21 gƂogowski 0203
3 Szymon Franciszek HOƁOWNIA 13.14 górowski 0204
4 Szymon Franciszek HOƁOWNIA 11.18 jaworski 0205

Plot the map

In [26]:
third_places_fig = px.choropleth_mapbox(
    third_places_df, geojson=counties, color='Third place',
    locations='Kod TERYT', featureidkey="properties.JPT_KOD_JE",
    center=poland_center, opacity=opacity,
    color_discrete_sequence=['rgb(255,217,47)', 'rgb(102,102,102)', '#FF7F0E'],
    hover_data={'Powiat': True, 'Kod TERYT': False, 'Result': True},
    mapbox_style="carto-positron", zoom=poland_zoom
)

third_places_fig.update_layout(margin=map_margin)

third_places_fig.show()

Disproportions between results

Analyzing these maps, one can see that for some candidates their voters are spread similarly around the whole country. Meanwhile, the others have much greater support in some regions. Who is the candidate of the most equally spread electorate?

In [27]:
coefficient_of_variation_df = pd.DataFrame(
    results_counties_percent_df[candidates].apply(sp.variation)
).sort_values(by=0).transpose()

coefficient_of_variation_df
Out[27]:
Krzysztof BOSAK StanisƂaw JĂłzef ƻÓƁTEK Andrzej Sebastian DUDA Szymon Franciszek HOƁOWNIA PaweƂ Jan TANAJNO Robert BIEDROƃ WƂadysƂaw Marcin KOSINIAK-KAMYSZ RafaƂ Kazimierz TRZASKOWSKI MirosƂaw Mariusz PIOTROWSKI Waldemar WƂodzimierz WITKOWSKI Marek JAKUBIAK
0 0.197774 0.214243 0.244968 0.256535 0.26028 0.310794 0.321589 0.346106 0.372991 0.378609 0.496703
In [28]:
coefficient_of_variation_df = coefficient_of_variation_df.transpose().reset_index()
coefficient_of_variation_df.columns = ['Candidate', 'Coefficient of variation']
coefficient_of_variation_df['Candidate'] = coefficient_of_variation_df['Candidate'].apply(get_last_name)
In [29]:
coefficient_of_variation_fig = px.bar(
    coefficient_of_variation_df, x='Candidate', y='Coefficient of variation',
    color='Coefficient of variation', color_continuous_scale=px.colors.diverging.RdYlGn[::-1],
    title='Coefficient of variation of voters by county',
)

coefficient_of_variation_fig.show()

As we see, Krzysztof Bosak is the most equally supported candidate in Poland. He is follwed by StanisƂaw ƻóƂek and Andrzej Duda. RafaƂ Trzaskowski is the 8th in this comparison. Marek Jakubiak is at the end of the list.

How many voters will not have their first-choice candidate in the second round?

The crucial challange Andrzej Duda and RafaƂ Trzaskowski will need to face in the second round is to convince the voters who did not vote for them in the first round. Which counties have the most voters to convince? In other words, what counties should the two candidates focus on the most in the campaign?

We first find the number of voters of the other candidates in each county.

In [30]:
results_counties_df = pd.read_csv('data/results/results_by_county.csv', sep=';')
In [31]:
candidates_2nd_round = ['Andrzej Sebastian DUDA', 'RafaƂ Kazimierz TRZASKOWSKI']

candidates_no_2nd_round = [
    candidate 
    for candidate in candidates
    if candidate not in candidates_2nd_round
]

candidates_no_2nd_round_df = pd.DataFrame(results_counties_df[candidates_no_2nd_round].sum(axis=1))
candidates_no_2nd_round_df.columns = ['Other electorate']

results_potential_2nd_round_df = pd.concat(
    [results_counties_df[['Powiat', 'Kod TERYT']], candidates_no_2nd_round_df], axis=1
)

results_potential_2nd_round_df['Kod TERYT'] = results_potential_2nd_round_df['Kod TERYT'].astype(str).map(fix_teryt_county)

results_potential_2nd_round_df.head()
Out[31]:
Powiat Kod TERYT Other electorate
0 bolesƂawiecki 0201 11250
1 dzierĆŒoniowski 0202 11167
2 gƂogowski 0203 10451
3 gĂłrowski 0204 3573
4 jaworski 0205 5467

We plot it.

In [32]:
# We remove the results from ships and abroad because they will not be shown on the map
results_potential_2nd_round_df = results_potential_2nd_round_df[results_potential_2nd_round_df.Powiat != 'statki']
results_potential_2nd_round_df = results_potential_2nd_round_df[results_potential_2nd_round_df.Powiat != 'zagranica']

results_potential_2nd_round_fig = px.choropleth_mapbox(
    results_potential_2nd_round_df, geojson=counties, color='Other electorate',
    locations='Kod TERYT', featureidkey="properties.JPT_KOD_JE",
    center={"lat": 52, "lon": 19.1451},
    opacity=0.8, color_continuous_scale=px.colors.sequential.Reds,
    hover_data={'Powiat': True, 'Kod TERYT': False},
    mapbox_style="carto-positron", zoom=5.2
)

results_potential_2nd_round_fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
results_potential_2nd_round_fig.show()

They are mainly in big cities. It might be better to see how this looks in relative electorate.

In [33]:
candidates_no_2nd_round_percent_df = pd.DataFrame(results_counties_percent_df[candidates_no_2nd_round].sum(axis=1))
candidates_no_2nd_round_percent_df.columns = ['Other electorate [%]']

results_potential_2nd_round_percent_df = pd.concat(
    [results_counties_df[['Powiat', 'Kod TERYT']], candidates_no_2nd_round_percent_df], axis=1
)

results_potential_2nd_round_percent_df['Kod TERYT'] = \
    results_potential_2nd_round_percent_df['Kod TERYT'].astype(str).map(fix_teryt_county)

# We remove the results from ships and abroad because they will not be shown on the map
results_potential_2nd_round_percent_df = \
    results_potential_2nd_round_percent_df[results_potential_2nd_round_percent_df.Powiat != 'statki']
results_potential_2nd_round_percent_df = \
results_potential_2nd_round_percent_df[results_potential_2nd_round_percent_df.Powiat != 'zagranica']

results_potential_2nd_round_percent_fig = px.choropleth_mapbox(
    results_potential_2nd_round_percent_df, geojson=counties, color='Other electorate [%]',
    locations='Kod TERYT', featureidkey="properties.JPT_KOD_JE",
    center=poland_center,
    opacity=0.8, color_continuous_scale=px.colors.sequential.Reds,
    hover_data={'Powiat': True, 'Kod TERYT': False},
    mapbox_style="carto-positron", zoom=poland_zoom
)

results_potential_2nd_round_percent_fig.update_layout(margin=map_margin)

results_potential_2nd_round_percent_fig.show()

The received map is somewhat similar to the map of people who voted for RafaƂ Trzaskowski. It is thus more likely that he will get more new voters in the second round.